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We study publication bias in the social sciences by analyzing a known population of 
conducted studies — 221 in total — where there is a full accounting of what is 
published and unpublished. We leverage TESS, an NSF-sponsored program where 
researchers propose survey-based experiments to be run on representative 
samples of American adults. Because TESS proposals undergo rigorous peer 
review, the studies in the sample all exceed a substantial quality threshold. Strong 
results are 40 percentage points more likely to be published than null results, and 
60 percentage points more likely to be written up. We provide not only direct 
evidence of publication bias, but also identify the stage of research production at 
which publication bias occurs — authors do not write up and submit null findings. 



Publication bias occurs when "publication of study results is based on 
the direction or significance of the findings" (1). One pernicious form of 
publication bias is the greater likelihood of statistically significant results 
being published than statistically insignificant results, holding fixed 
research quality. Selective reporting of scientific findings is often re- 
ferred to as the "file drawer" problem (2). Such a selection process in- 
creases the likelihood that published results reflect Type I errors rather 
than true population parameters, biasing effect sizes upwards. Further, it 
constrains efforts to assess the state of knowledge in a field or on a par- 
ticular topic, since null results are largely unobservable to the scholarly 
community. 

Publication bias has been documented in various disciplines within 
the biomedical (3-9) and social sciences (10-17). One common method 
of detecting publication bias is replicating a meta-analysis with and 
without unpublished literature (18). This approach is limited because 
much of what is unpublished is unobserved. Other methods solely exam- 
ine the published literature and rely on assumptions about the distribu- 
tion of unpublished research by, for example, comparing the precision 
and magnitude of effect sizes among a group of studies. In the presence 
of publication bias smaller studies report larger effects in order to exceed 
arbitrary significance thresholds (19, 20). However, these visualization- 
based approaches are sensitive to using different measures of precision 
(21, 22) and also assume outcome variables and effect sizes are compa- 
rable across studies (23). Finally, methods that compare published stud- 
ies to "grey" literatures (e.g., dissertations, working papers, conference 
papers, human subjects registries) may confound strength of results with 
research quality (7). These techniques are also unable to determine 
whether publication bias occurs at the editorial stage or during the writ- 
ing stage. Editors and reviewers may prefer statistically significant re- 
sults and reject sound studies that fail to reject the null hypothesis. 
Anticipating this, authors may not write up and submit papers that have 
null findings. Or, authors may have their own preferences to not pursue 
the publication of null results. 

A different approach involves examining the publication outcomes 
of a cohort of studies, either prospectively or retrospectively (24, 25). 
Analyses of clinical registries and abstracts submitted to medical confer- 
ences consistently find little to no editorial bias against studies with null 
findings (26-31). Instead, failure to publish appears to be most strongly 
related to authors' perceptions that negative or null results are uninterest- 



ing and not worthy of further analysis 
or publication (32-35). One analysis of 
all IRB-approved studies at a single 
university over two years found that a 
majority of conducted research was 
never submitted for publication or peer- 
review (36). 

Surprisingly, similar cohort anal- 
yses are much rarer in the social scienc- 
es. There are two main reasons for this 
lacuna. First, there is no process in the 
social sciences of pre-registering stud- 
ies comparable to the clinical trials 
registry in the biomedical sciences. 
Second, even if some unpublished stud- 
ies could be identified, there are likely 
to be substantial quality differences 
between published and unpublished 
studies that make them difficult to 
compare. As noted, previous research 
attempted to identify unpublished re- 
sults by examining conference papers 
and dissertations (37) and human sub- 
jects registries of single institutions 
(36). However, such techniques may 
produce unrepresentative samples of unpublished research, and the 
strength of the results may be confounded with research quality. Confer- 
ence papers, for example, do not undergo a similar process of peer re- 
view as journal articles in the social sciences and therefore cannot be 
used as a comparison set. This paper is unique in the study of publication 
bias in the social sciences in that it analyzes a known population of con- 
ducted studies and all studies in the population exceed a substantial qual- 
ity threshold. 

We leverage TESS (Time-sharing Experiments in the Social Scienc- 
es), an NSF-sponsored program established in 2002 where researchers 
propose survey-based experiments to be run on nationally representative 
samples. These experiments typically embed some randomized manipu- 
lation (e.g., visual stimulus, question wording difference) within a sur- 
vey questionnaire. Researchers apply to TESS, which then peer reviews 
the proposals and distributes grants on a competitive basis (38). Our 
basic approach is to compare the statistical results of TESS experiments 
that eventually got published to the results of those that remain un- 
published. 

This analytic strategy has many advantages. First, we have a known 
population of conducted studies, and therefore have a full accounting of 
what is published and unpublished. Second, TESS proposals undergo 
rigorous peer review, meaning that even unpublished studies exceed a 
substantial quality threshold before they are conducted. Third, nearly all 
of the survey experiments were conducted by the same, high-quality 
survey research firm (Knowledge Networks, now known as GfK Custom 
Research), which assembles probability samples of Internet panelists by 
recruiting participants via random digit dialing and address-based sam- 
pling. Thus, there is remarkable similarity across studies with respect to 
how they were administered, allowing for comparability. Fourth, TESS 
requires that studies have requisite statistical power, meaning that the 
failure to obtain statistically significant results is not simply due to insuf- 
ficient sample size. 

One potential concern is that TESS studies may be unrepresentative 
of social science research, especially scholarship based on non- 
experimental data. While TESS studies are clearly not a random sample 
of the research conducted in the social sciences, it is unlikely that publi- 
cation bias is less severe than what is reported here. The baseline proba- 
bility of publishing experimental findings based on representative 
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samples is likely higher than that of observational studies using "off-the- 
shelf datasets or experiments conducted on convenience samples where 
there is lower "sunk cost" involved in obtaining the data. Because the 
TESS data were collected at considerable expense — in terms of time to 
obtain the grant — authors should, if anything, be more motivated to at- 
tempt to publish null results. 

The initial sample consisted of the entire online archive of TESS 
studies as of January 1, 2014 (39). We analyzed studies conducted be- 
tween 2002 and 2012. We did not track studies conducted in 2013 be- 
cause there had not been enough time for the authors to analyze the data 
and proceed through the publication process. The 249 studies represent a 
wide range of social science disciplines (see Table 1). Our analysis was 
restricted to 221 studies — 89% of the initial sample. We excluded seven 
studies published in book chapters, and 21 studies for which we were 
unable to determine the publication status and/or the strength of experi- 
mental findings (40). The full sample of studies is presented in Table 2; 
the bolded entries represent the analyzed subsample of studies. 

The outcome of interest is the publication status of each TESS ex- 
periment. We took numerous approaches to determine whether the re- 
sults from each TESS experiment appeared in a peer-reviewed journal, 
book, or book chapter. We first conducted a thorough online search for 
published and unpublished manuscripts, and read every manuscript to 
verify that it relied on data collected through TESS and that it reported 
experimental results (40). We then emailed the authors of over 100 stud- 
ies for which we were unable to find any trace of the study and asked 
what happened to their studies. We also asked authors who did not pro- 
vide a publication or working paper to summarize the results of their 
experiments. 

The outcome variable distinguishes between two types of un- 
published experiments: those prepared for submission to a conference or 
journal, and those never written up in the first place. It is also possible 
that papers with null results may be excluded from the very top journals 
but still find their way into the published literature. Thus, we disaggre- 
gated published experiments based on their placement in top-tier or non- 
top-tier journals (40) (see Table SI for a list of journal classifications). 
The results from the majority of TESS studies in our analysis sample 
have been written up (80%), while less than half (48%) have been pub- 
lished in academic journals. 

We also ascertained whether the results of each experiment are de- 
scribed as statistically significant by their authors. We did not analyze 
the data ourselves to determine if the findings were statistically signifi- 
cant for two main reasons. First, it is often very difficult to discern the 
exact analyses the researchers intended. The proposals that authors sub- 
mit to TESS are not a matter of public record, and many experiments 
have complex experimental designs with numerous treatment conditions, 
outcome variables, and moderators. Second, what is most important is 
whether the authors themselves consider their results to be significant, as 
this influences how they present their results to editors and reviewers, as 
well as whether they decide to write a paper. Studies were classified into 
three categories of results: strong (all/most of hypotheses were supported 
by the statistical tests), null (all/most hypotheses were not supported), 
and mixed (remainder of studies) (40). Approximately 41% of the stud- 
ies in our analysis sample reported strong evidence in favor of the stated 
hypotheses, 37% reported mixed results, and 22% reported null results. 

There is a strong relationship between the results of a study and 
whether it was published, a pattern indicative of publication bias. The 
main findings are presented in Table 3, which is a cross-tabulation of 
publication status against strength of results. A Pearson chi-squared test 
of independence is easily rejected [y?(6) = 80.3, P < 0.001], implying 
that there are clear differences in the statistical results between published 
and unpublished studies. While around half of the total studies in our 
sample were published, only 20% of those with null results appeared in 
print. In contrast, roughly 60% of studies with strong results and 50% of 



those with mixed results were published. Although more than 20% of the 
studies in our sample had null findings, less than 10% of published arti- 
cles based on TESS experiments report such results. While the direction 
of these results may not be surprising, the observed magnitude (an ap- 
proximately 40 percentage point increase in the probability of publica- 
tion from moving from null to strong results) is remarkably large. 

However, what is perhaps most striking in Table 1 is not that so few 
null results are published, but that so many of them are never even writ- 
ten up (65%). The failure to write up null results is problematic for two 
reasons. First, researchers might be wasting effort and resources in con- 
ducting studies that have already been executed where the treatments 
were not efficacious. Second, and more troubling, if future researchers 
conduct similar studies and obtain significant results by chance, then the 
published literature on the topic will erroneously suggest stronger ef- 
fects. Hence, even if null results are characterized by treatments that "did 
not work" and strong results are characterized by efficacious treatments, 
authors' failures to write up null findings still adversely affects the uni- 
verse of knowledge. Interestingly, once we condition on studies that 
were written up, there is no significant relationship between strength of 
results and publication status (see Table S2). 

A series of additional analyses demonstrate the robustness of our re- 
sults. Estimates from multinomial probit regression models show that 
studies with null findings are significantly less likely to be written up 
even after controlling for researcher quality (using the highest quality 
researcher's cumulative h-index and the number of publications at the 
time the study was ran), discipline of the lead author, and the date the 
study was conducted (see online supplementary text and Table S3). Fur- 
ther, the relationship between strength of results and publication status 
does not vary across levels of these covariates (see online supplementary 
text and Tables S4 and S5). Another potential concern is that our coding 
of the statistical strength of results is based on author self-reports, intro- 
ducing the possibility of measurement error and misclassification. A 
sensitivity analysis shows that our findings are robust to even dramatic 
and unrealistic rates of misclassification (see online supplementary text 
and Figure SI). 

Why do some researchers choose not to write up null results? To 
provide some initial explanations, we classified 26 detailed email re- 
sponses we received from researchers whose studies yielded null results 
and did not write a paper (see Table S6). Fifteen of these authors report- 
ed that they abandoned the project because they believed that null results 
have no publication potential even if they found the results interesting 
personally (e.g., "I think this is an interesting null finding, but given the 
discipline's strong preference for p < .05, I haven't moved forward with 
it"). Nine of these authors reacted to null findings by reducing the priori- 
ty of writing up the TESS study and focusing on other projects (e.g., 
"There was no paper unfortunately. There still may be in future. The 
findings were pretty inconclusive."). Perhaps most interestingly, two 
authors whose studies "didn't work out" eventually published papers 
supporting their initial hypotheses using findings obtained from smaller 
convenience samples. 

How can the social science community combat publication bias of 
this sort? Based on communications with the authors of many experi- 
ments that resulted in null findings, we found that some researchers an- 
ticipate the rejection of such papers but also that many of them simply 
lose interest in "unsuccessful" projects. These findings show that a vital 
part of developing institutional solutions to improve scientific transpar- 
ency would be to understand better the motivations of researchers who 
choose to pursue projects as a function of results. 

Few null findings ever make it to the review process. Hence, pro- 
posed solutions such as two-stage review (the first stage for the design 
and the second for the results), pre-analysis plans (41), and requirements 
to pre-register studies (16) should be complemented by incentives to not 
bury insignificant results in file drawers. Creating high-status publication 
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outlets for these studies could provide such incentives. The movement 
toward open-access journals may provide space for such articles. Fur- 
ther, the pre-analysis plans and registries themselves will increase re- 
searcher access to null results. Alternatively, funding agencies could 
impose costs on investigators who do not write up the results of funded 
studies. Finally, resources should be deployed for replications of pub- 
lished studies if they are unrepresentative of conducted studies and more 
likely to report large effects. 
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Table 1 : Distribution of studies across years and disciplines. Note: Field coded based on the affiliation of the first author. "Other" 
category includes: Business, Computer Science, Criminology, Education, Environmental Studies, Journalism, Law, and Survey Meth- 
odology. 



Year 


Communication 


Economics 


Political 
Science 


Public 
Health 


Psychology 


Sociology 


Other 


Total 


2002 


0 


0 


1 


0 


0 


0 


0 


1 


2003 


0 


1 


4 


0 


6 


2 


1 


14 


2004 


0 


2 


9 


1 


5 


0 


0 


17 


2005 


2 


2 


13 


0 


10 


7 


1 


35 


2006 


3 


1 


12 


1 


9 


6 


0 


32 


2007 


0 


0 


5 


0 


3 


2 


0 


10 


2008 


2 


0 


11 


1 


4 


2 


1 


21 


2009 


0 


0 


12 


1 


8 


2 


3 


26 


2010 


3 


3 


22 


0 


5 


6 


2 


41 


2011 


2 


0 


19 


1 


9 


6 


2 


39 


2012 


1 


1 


5 


1 


1 


3 


1 


13 


Total 


13 


10 


113 


6 


60 


36 


11 


249 
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Table 2. Cross-tabulation between statistical results of TESS studies and their publication status. Note: Entries are counts of studies 
by publication status and results. Bolded entries indicate observations included in the final sample for analysis (40). Results are robust to the 
inclusion of book chapters (see Table S7). 





Unpublished, Not 
written 


Unpublished, 
Written 


Published 


Book chapter 


Missing 


Total 


Null results 


31 


7 


10 


1 


0 


49 


Mixed results 


10 


32 


40 


3 


1 


86 


Strong results 


4 


31 


56 


1 


1 


93 


Missing 


6 


1 


0 


2 


12 


21 


Total 


51 


71 


106 


7 


14 


249 



Table 3. Cross-tabulation between statistical results of TESS studies and their publication status (column percentages 
reported). Pearson x 2 test of independence: x 2 (6) = 80.3, P< 0.001 . 





Null 


Mixed 


Strong 


Not written 


64.6% 


12.2% 


4.4% 


Written but not published 


14.6 


39.0 


34.1 


Published (non-top-tier) 


10.4 


37.8 


38.5 


Published (top-tier) 


10.4 


11.0 


23.1 


Total 


100.0 


100.0 


100.0 
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